SynthAssess Report

Original Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
31 4 373432 6 5 2 2 0 4 1 0 0 43 38 0
33 4 394727 0 6 4 5 4 2 1 0 0 40 38 0
42 4 228320 15 10 2 12 0 4 1 0 0 50 38 1
58 4 33350 11 9 0 7 4 4 0 0 0 35 38 0
25 4 168943 9 13 2 3 5 4 0 0 0 30 38 1
Synthetic Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
20.6 4.0 100102.9 15.0 9.0 4.0 2.0 3.0 4.0 1.0 87.8 0.0 32.7 38.0 0.0
48.7 4.0 201231.5 15.0 9.0 0.0 9.0 4.0 4.0 0.0 82.9 1.6 25.7 38.0 0.0
28.1 4.0 74025.1 15.0 9.0 4.0 3.0 3.0 4.0 1.0 0.0 0.0 41.9 38.0 0.0
42.1 4.0 143485.3 15.0 14.0 2.0 0.0 0.0 4.0 1.0 128.0 16.0 40.1 38.0 1.0
42.7 4.0 220958.8 15.0 11.0 2.0 9.0 0.0 4.0 1.0 83062.0 0.0 43.4 38.0 1.0
Range Coverage
Column Range Coverage (%)
age 100.0
fnlwgt 100.0
education 100.0
education-num 100.0
marital-status 100.0
occupation 100.0
relationship 100.0
race 100.0
sex 100.0
capital-gain 100.0
capital-loss 100.0
hours-per-week 100.0
income 100.0
native-country 97.5
workclass 87.5
Mean Range Coverage 99.0
Descriptive Statistics for Original Data
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00
mean 38.65 3.86 189037.44 10.35 10.07 2.58 5.72 1.43 3.66 0.67 1110.60 84.77 40.29 35.90 0.24
std 13.60 1.48 101210.51 3.82 2.57 1.49 3.99 1.60 0.86 0.47 7774.42 397.07 12.34 7.39 0.43
min 17.00 0.00 13769.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
25% 28.00 4.00 119429.50 9.00 9.00 2.00 2.00 0.00 4.00 0.00 0.00 0.00 40.00 38.00 0.00
50% 37.00 4.00 178513.50 11.00 10.00 2.00 6.00 1.00 4.00 1.00 0.00 0.00 40.00 38.00 0.00
75% 47.00 4.00 239548.25 12.00 12.00 4.00 9.00 3.00 4.00 1.00 0.00 0.00 45.00 38.00 0.00
max 90.00 8.00 816750.00 15.00 16.00 6.00 13.00 5.00 4.00 1.00 99999.00 3004.00 99.00 40.00 1.00
Descriptive Statistics for Synthetic Data
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00
mean 38.34 3.64 186132.74 14.72 9.93 2.37 4.50 1.29 3.68 0.70 873.66 72.15 40.51 35.71 0.23
std 11.48 1.55 76989.75 1.70 2.32 1.52 4.22 1.57 0.87 0.46 5952.85 346.02 8.91 8.56 0.42
min 17.00 0.00 13769.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
25% 29.30 4.00 135897.85 15.00 9.00 2.00 1.00 0.00 4.00 0.00 0.00 0.00 37.90 38.00 0.00
50% 37.60 4.00 177197.20 15.00 10.00 2.00 3.00 1.00 4.00 1.00 84.50 0.00 40.50 38.00 0.00
75% 46.00 4.00 225299.72 15.00 11.80 4.00 9.00 3.00 4.00 1.00 304.45 0.60 43.72 38.00 0.00
max 90.00 7.00 816750.00 15.00 16.00 6.00 13.00 5.00 4.00 1.00 99999.00 3004.00 99.00 39.00 1.00
Comparison of Descriptive Statistics
Bivariate Correlation Matrix
Scatter Plot Comparison
Descriptive Statistics as Radar Chart

Average k-NN Distance for Original Samples: {'count': 40000.0, 'mean': 1109.7683209521285, 'std': 4891.977431626499, 'min': 13.797767368835443, '25%': 134.53981328813265, '50%': 245.28358924039577, '75%': 530.3709275643271, 'max': 181513.18427882335}

Average k-NN Distance for Synthetic Samples: {'count': 40000.0, 'mean': 988.693271797106, 'std': 5430.518367056293, 'min': 29.251918535593084, '25%': 179.89458020199245, '50%': 296.89153588997794, '75%': 626.6842695093417, 'max': 249325.7227115818}

Average Neighbours for Original Samples: {'count': 40000.0, 'mean': 0.5, 'std': 0.5000062501171899, 'min': 0.0, '25%': 0.0, '50%': 0.5, '75%': 1.0, 'max': 1.0}

k-NN Distance Benchmark
NNeighbours for Original Sample

Privacy Matrix Difference: 59.60672863693281

Privacy Matrix

the main privacy attack, in which the attacker uses the synthetic data to guess information on records in the original data.

the baseline attack, which models a naive attacker who ignores the synthetic data and guess randomly.

the control privacy attack, in which the attacker uses the synthetic data to guess information on records in the control dataset.

Singling Out Results

Overall Singling Out PrivacyRisk(value=0.11902156657798059, ci=(0.0, 0.2431128648195115))

Main: SuccessRate(value=0.18425818370534194, error=0.10088397692500792)

Baseline: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Control: SuccessRate(value=0.07405018630701342, error=0.0624302257663069)

Linkability Results

Overall Linkage PrivacyRisk(value=0.019260130035860284, ci=(0.0, 0.08314540970956837))

Main: SuccessRate(value=0.05424684758401218, error=0.05070758831236596)

Baseline: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Control: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Inference Results

Inference Attack
Original Data Classification Report
index precision recall f1-score support
0 0.88 0.94 0.91 757.00
1 0.77 0.58 0.66 243.00
accuracy 0.86 0.86 0.86 0.86
macro avg 0.82 0.76 0.78 1000.00
weighted avg 0.85 0.86 0.85 1000.00
Synthetic Data Classification Report
index precision recall f1-score support
0 0.87 0.93 0.90 757.00
1 0.71 0.56 0.63 243.00
accuracy 0.84 0.84 0.84 0.84
macro avg 0.79 0.74 0.76 1000.00
weighted avg 0.83 0.84 0.83 1000.00
ROC Curve
Data Discriminator Original X Synthetic
index precision recall f1-score support
0 0.99 0.99 0.99 804.00
1 0.99 0.99 0.99 796.00
accuracy 0.99 0.99 0.99 0.99
macro avg 0.99 0.99 0.99 1600.00
weighted avg 0.99 0.99 0.99 1600.00
Feature Importance Original X Synthetic
Data Discriminator Original X Holdout
index precision recall f1-score support
0 0.73 0.63 0.68 199.0
1 0.68 0.77 0.72 201.0
accuracy 0.70 0.70 0.70 0.7
macro avg 0.70 0.70 0.70 400.0
weighted avg 0.70 0.70 0.70 400.0
Feature Importance Original X Holdout
Data Discriminator Synthetic X Holdout
index precision recall f1-score support
0 0.98 0.98 0.98 199.00
1 0.98 0.98 0.98 201.00
accuracy 0.98 0.98 0.98 0.98
macro avg 0.98 0.98 0.98 400.00
weighted avg 0.98 0.98 0.98 400.00
Feature Importance Synthetic X Holdout